Efficient Text Compression Using Special Character Replacement and Space Removal

نویسنده

Debashis Chakraborty

چکیده

In this paper, we have proposed a new concept of text compression/decompression algorithm using special character replacement technique. Moreover after the initial compression after replacement of special characters, we remove the spaces between the words in the intermediary compressed file in specific situations to get the final compressed text file. Experimental results show that the proposed algorithm is very simple in implementation, fast in encoding time and high in compression ratio and even gives better compression than existing algorithms like LZW, WINZIP 10.0 and WINRAR 3.93.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Lossless Colour Image Compression Using Run Length Encoding and Special Character Replacement

Image compression, in the present context of heavy network traffic, is going through major research and development. The lossless compression techniques, presently in practise, follow three basic paradigms – character repetition removal, frequency measurement-encoding and dictionary maintenance. In the proposed method, the character repetition removal and dictionary maintenance concepts were in...

متن کامل

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word’s reference in dictionary is used in compression. This approach is not file based; a common dictionary is used for compression. Which contains the words, the position of the word in dictionary is one of the key parts of e...

متن کامل

Domain Specific Hierarchical Huffman Encoding

In this paper, we revisit the classical data compression problem for domain specific texts. It is well-known that classical Huffman algorithm is optimal with respect to prefix encoding and the compression is done at character level. Since many data transfer are domain specific, for example, downloading of lecture notes, web-blogs, etc., it is natural to think of data compression in larger dimen...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Morphological Analysis and Diacritical Arabic Text Compression

Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabic dictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Optical character recognition. All these factors allow efficient storage and archival of multilingual digital libraries that include Arabic texts. This paper presents a lossless compression algorithm based...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Efficient Text Compression Using Special Character Replacement and Space Removal

نویسنده

چکیده

منابع مشابه

Efficient Lossless Colour Image Compression Using Run Length Encoding and Special Character Replacement

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Domain Specific Hierarchical Huffman Encoding

A new model for persian multi-part words edition based on statistical machine translation

Morphological Analysis and Diacritical Arabic Text Compression

عنوان ژورنال:

اشتراک گذاری